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ABSTRACT 


The purpose of this paper is to recognize Bottle-neck Diagnosis in Regional Economic System. This paper presents a new 
self-organizing data mining method applied to Bottle-neck Diagnosis in Regional Economic System. The result shows that 
the new method is more effective in diagnosis compared with GMDH method. Its algorithm is relatively simple. It relies on 
the expert's destination of the inputs and outputs from lots of data instead of the models. The comparative results between 
new method and the GMDH method show that the accuracy of this new method is similar as GMDH method, however the 
additional precondition can be found. This is the prominent characteristic of the new method. Thus, its application in 


diagnosis in the fields of Management Science is more promising. 
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INTRODUCTION 


Data mining technology has received great attention by experts at home and abroad in recent years, because it can mine 
and learn valuable and implicit knowledge from a large number of data!"!, It also has been widely used in the field of fault 


diagnosis '7!. 


GMDH-Type Neural Networks is an algorithm first proposed by Kondo, a Japanese scholar. This algorithm has 
been widely used in economic management and social system prediction because it can objectively determine the hierarchy 


and the number of hidden layers in the neural network and avoid the subjectivity of data partitioning in GMDH"!. 


The core idea of GMDH-Type Neural Networks is to identify the non-linear system model by continuously sifting 
the combined model through the external criteria (accuracy criteria or compatibility criteria) of the neural network and 
GMDH algorithm "!. If this idea is applied to the diagnosis of the bottleneck in the regional economic system, the basic 
law of the relationship between the most direct bottleneck of regional economic development and the current situation of 
regional economy can be discovered. However, if this method is used without improvement, the relationship obtained is a 
deterministic causal relationship, and the information used is deterministic information, which is often far from the 


diagnostic facts of regional economic bottlenecks. To improve this situation, we can adopt the method of NF-GMDH 
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network, which integrates GMDH network with fuzzy logic, so that GMDH network can use not only data information, but 
also language information. Each layer of neurons in NF-GMDH is a fuzzy model. But after the input of each layer is 
determined, the condition part of the rule is determined. The parameters of the fuzzy model are estimated by HPM (Hybrid 
Projection Method) method, which neurons are retained and deleted by using external criteria, and the output of the 
selected neurons constitutes the output of the intermediate neurons'®”!, When the test data criteria of each layer are’ no 
longer reduced, the optimal fuzzy model is obtained. The method of NF-GMDH solves the problems of extracting fuzzy 
rules and utilizing linguistic information, but it does not solve the problem of testing rules, so it is impossible to mine other 


data and complete the diagnosis process"). 


Because the diagnosis of regional economic bottleneck focuses on the diagnosis of management problems, the 
fact in the diagnosis process is the uncertainty knowledge that people can not fully understand. The NF-GMDH method 
can not effectively solve the above problems. We propose a new algorithm: Fuzzy GMDH-Type method. This method can 
not only make full use of all kinds of information obtained in the diagnosis process, but also ensure the ability of the 


method to extract uncertainty rules. 
Difference between Fuzzy GMDH-Type Method and GMDH Method 


Fuzzy GMDH-Type method firstly fuzzifies the data, extracts the fuzzy rules, and then tests the rules as the rules for the 
diagnosis of regional economic bottlenecks through neural network. Fuzzy GMDH-Type method can extract the uncertain 
fuzzy rules. The concrete steps include: dividing the original data into training set and test set. The training set is used to 
estimate the weights of the neural network, and the test data is used to define the membership function and organize the 
structure of the neural network. After training, the parameters of the fuzzy model are obtained by using HPM method. 
According to the Balance of Variable Criteria, the reserved neurons are determined as the output of the intermediate 
neurons. When the criterion value is no longer reduced, the fuzzy model can be obtained. In the training set, n fuzzy rules 


can be created for regional economic bottlenecks, which can be used for reasoning. 


In contrast, GMDH algorithm needs to construct GMDH input and output model first''®"*!, In self-organizing data 
mining, prior knowledge can be directly used to select reference functions and external criteria. Generally, self-organizing 
data mining uses general K-G polynomials as reference functions. However, when possessing prior knowledge of the 
system (domain expert knowledge), it can be directly used to construct specific reference functions reflecting system 
knowledge. According to the author's previous research, GMDH method has interaction among variables in the application 


of regional economic bottleneck diagnosis system: therefore, the model structure is as follows: 
y= ay t+ f,(%) + fr%,%s) (1) 


Application of Fuzzy GMDH-TYPE Method in Regional Economic Diagnosis 


Based on the regional economic data of China in 2003, the bottleneck model of regional economy was established by using 
GMDH method. The variables related to regional development bottlenecks were selected through software i.e. 


KnowledgeMiner. 





' ROC (receiver operation charachateristic) refers to the operating characteristic curve of the subject, which can give consideration to 
both sensitivity and specificity to evaluate comprehensively the recognition performance of the classifier. As a quantitative index, the 
area under ROC curve can directly and effectively help to optimize classification thresholds and compare the performance of different 
classifiers. 
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The input variables are: 


X1 - Illiteracy Rate X2 - Transaction Volume X3 in Technology Market - Number of Telephone Users X4 at the 
End of the Year - Number of Internet Users 


X5-Railway Operation Mileage X6-Ratio of Grade Highway X7-Highway Mileage X8-Total Import and Export 
X9-Total Retail of Social Commodities X10-Government Management Ability X11-Relation between Government and 


Enterprise X12-Regional Openness 
The input dependent variable is: 
Y-Regional Competitiveness 


Our reasoning method is that the coefficient of an independent variable is the largest, which shows that the independent 
variable is the bottleneck of the development of the region. By using Knowledge Miner software to screen the data, the 


optimal model is obtained. 


Y =0.0166216Z,, +0.0185552Z,, +0.0716154 (2) 
Z,, =—1.73994X , —0.0617410X,, + 0.713224 (3) 
Z,, = 0.000286295X , —0.0000144063X , + 0.152434 (4) 


Among them, the sum of squares of prediction error (PESS) is 0.7352, the average absolute percentage error is 
17.56%, the approximate heteroscedasticity is 0.6424, and the determination coefficient is () 0.3576. The output variable is 
X13 (ie. regional competitiveness). Relevant input variables are X1 (illiteracy rate), X11 (government-enterprise 
relationship), X5 (railway operating mileage) and X7 (road mileage). Their ability to eliminate model errors is 33%, 28%, 
17% and 22%, respectively. The three optimal models of GMDH algorithm mining are illustrated as follows: 
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Figure 1: Model 1 Obtained by GMDH Algorithm. 
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Figure 2: Model 2 Obtained by GMDH Algorithm. 
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Figure 4: Roc of GMDH Algorithm. 
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Figure 5: Model Scatter Plot of GMDH Algorithm. 


From the model data and Fig. 1-3, it can be found that the model obtained by this algorithm can be basically used 
for further rule extraction. The rule is that if the number of railway mileage (high) is low, the regional competitiveness will 


be high (low). It shows that the railway mileage in 2003 is a major bottleneck of regional competitiveness. 


Then the data from 2001 to 2005 are tested by artificial neural network. All data samples that can be interpreted 
by the above rules are deleted from the whole data set. The deleted data set is used as training data sample set to construct 
a diagnostic neural network model. Several diagnostic models can be constructed by repeatedly mining data of different 
years based on the above methods. After the construction of the diagnosis model is completed, the data set formed from 
1995 to 2005 is tested. When the error of diagnosis results is less, than a certain threshold value, the data samples are 
deleted from the data set, which indicates that the fault information hidden in the deleted large amount of data can be 
expressed by the diagnosis model, while a small amount of data that has not been deleted can be stored in the data set after 
the noise is removed by the user interaction test in case base. This case knowledge expresses special bottleneck knowledge 
which is different from general diagnostic knowledge model. Binary tree organization should be carried out according to 


its abnormal characteristics in order to facilitate case retrieval for case-based bottleneck diagnosis. 


In further work, the data processing function of KnowledgeMiner can be used to fuzzify the data. Using the 
command of creating a fuzzy input-output model, the following rules can be obtained after data processing in 2003. If the 
illiteracy rate is not high or low, the railway transportation mileage is high (low), the regional competitiveness is high 
(low). The total absolute error of the rule is 1.70, the average absolute error is 6.18%, and the approximate 


heteroscedasticity is 13.1815. The specific model diagram is shown in Figure 7-9. 
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Figure 7: Optimal Model 1 Obtained by GMDH 
Algorithm. 
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Figure 8: Optimal Model 2 Obtained by Fuzzy GMDH 
Algorithm. 
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Figure 9: Optimal Model 3 Obtained by Fuzzy GMDH 
Algorithm. 


Similarly, artificial neural network is used to test the data from 2001 to 2005, and several diagnostic models are 


constructed. The other steps are the same. 
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CONCLUSIONS 


In this paper, a new self-organizing data mining algorithm, Fuzzy GMDH-Type method, is proposed and successfully 
applied to the diagnosis of regional economic bottlenecks. This method solves the problem of identifying regional 
economic bottlenecks. By comparing with the classical GMDH algorithm, it is found that the recognition accuracy of this 


method is high and new preconditions are added. 


This study provides a basic operation scheme for the diagnosis of regional economic bottlenecks. This scheme 
will combine the subsequent work with the reverse reasoning model to form a set of diagnostic procedures for the 
diagnosis of regional economic bottlenecks, thus laying a relatively solid foundation for the scientific evaluation, 


monitoring and diagnosis of regional economy. 
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APPENDIX 


Regional Economic Development of 26 Provinces (Municipalities) randomly selected in 2003 
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* The indicators of this paper are the research results of the author in the key projects of the National Social Science Fund, 


and the regional competitiveness score is the research results of the Liaoning Province Department of Education under the 


auspices of the author. 
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